In [1]:
library(Seurat)
library(ggplot2)
library(dplyr)
options(repr.plot.width=10, repr.plot.height=8)
Loading required package: SeuratObject
Loading required package: sp
‘SeuratObject’ was built under R 4.4.1 but the current version is
4.4.2; it is recomended that you reinstall ‘SeuratObject’ as the ABI
for R may have changed
‘SeuratObject’ was built with package ‘Matrix’ 1.6.5 but the current
version is 1.7.3; it is recomended that you reinstall ‘SeuratObject’ as
the ABI for ‘Matrix’ may have changed
Attaching package: ‘SeuratObject’
The following objects are masked from ‘package:base’:
intersect, t
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
In [2]:
in_path <- "~/projects/2024/RA/GSE136035_RAW/"
out_path <- "~/projects/2024/RA/GSE136035_RAW/output/"
In [3]:
f_barcodes <- dir(path = in_path, pattern = "*barcodes.tsv.gz")
f_barcodes
- 'GSM4039805_165_barcodes.tsv.gz'
- 'GSM4039807_178_barcodes.tsv.gz'
- 'GSM4039809_182_barcodes.tsv.gz'
- 'GSM4039813_177_barcodes.tsv.gz'
- 'GSM4039815_178_T1_barcodes.tsv.gz'
- 'GSM4039819_177_T1_barcodes.tsv.gz'
- 'GSM4885423_HC2_barcodes.tsv.gz'
- 'GSM4885425_HC3_barcodes.tsv.gz'
In [4]:
d_name <- character()
f_features <- character()
f_matrix <- character()
for (i in 1:length(f_barcodes)) {
d_name[i] <- substr(f_barcodes[i], 1, 10)
f_features[i] <- paste0(substr(f_barcodes[i], 1, (nchar(f_barcodes[i])-15)),"genes.tsv.gz")
f_matrix[i] <- paste0(substr(f_barcodes[i], 1, (nchar(f_barcodes[i])-15)),"matrix.mtx.gz")
if (!file.exists(paste0(in_path,d_name[i],"/barcodes.tsv.gz"))) {
dir.create(paste0(in_path,d_name[i]))
file.copy(paste0(in_path,f_barcodes[i]), paste0(in_path,d_name[i],"/barcodes.tsv.gz"))
file.copy(paste0(in_path,f_features[i]), paste0(in_path,d_name[i],"/features.tsv.gz"))
file.copy(paste0(in_path,f_matrix[i]), paste0(in_path,d_name[i],"/matrix.mtx.gz"))
}
}
In [5]:
scdata <- list()
for (i in 1:length(f_barcodes)) {
data <- Read10X(data.dir = paste0(in_path,d_name[i]))
scdata[[i]] <- CreateSeuratObject(counts = data, project = d_name[i])
i
}
scdata
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
10X data contains more than one type and is being returned as a list containing matrices of each type.
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
10X data contains more than one type and is being returned as a list containing matrices of each type.
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
[[1]] An object of class Seurat 33694 features across 2704 samples within 1 assay Active assay: RNA (33694 features, 0 variable features) 1 layer present: counts [[2]] An object of class Seurat 33694 features across 667 samples within 1 assay Active assay: RNA (33694 features, 0 variable features) 1 layer present: counts [[3]] An object of class Seurat 33694 features across 1862 samples within 1 assay Active assay: RNA (33694 features, 0 variable features) 1 layer present: counts [[4]] An object of class Seurat 33694 features across 2079 samples within 1 assay Active assay: RNA (33694 features, 0 variable features) 1 layer present: counts [[5]] An object of class Seurat 33694 features across 912 samples within 1 assay Active assay: RNA (33694 features, 0 variable features) 1 layer present: counts [[6]] An object of class Seurat 33694 features across 1173 samples within 1 assay Active assay: RNA (33694 features, 0 variable features) 1 layer present: counts [[7]] An object of class Seurat 33542 features across 9830 samples within 1 assay Active assay: RNA (33542 features, 0 variable features) 2 layers present: counts.Gene Expression, counts.Antibody Capture [[8]] An object of class Seurat 33542 features across 8893 samples within 1 assay Active assay: RNA (33542 features, 0 variable features) 2 layers present: counts.Gene Expression, counts.Antibody Capture
In [6]:
GSE136035 <- merge(scdata[[1]], y = scdata[-1], add.cell.ids = d_name)
In [7]:
GSE136035
An object of class Seurat 45072 features across 28120 samples within 1 assay Active assay: RNA (45072 features, 0 variable features) 10 layers present: counts.GSM4039805, counts.GSM4039807, counts.GSM4039809, counts.GSM4039813, counts.GSM4039815, counts.GSM4039819, counts.Gene Expression.GSM4885423, counts.Antibody Capture.GSM4885423, counts.Gene Expression.GSM4885425, counts.Antibody Capture.GSM4885425
In [8]:
GSE136035[["percent.mt"]] <- PercentageFeatureSet(GSE136035, pattern = "^MT-")
In [9]:
GSE136035
An object of class Seurat 45072 features across 28120 samples within 1 assay Active assay: RNA (45072 features, 0 variable features) 10 layers present: counts.GSM4039805, counts.GSM4039807, counts.GSM4039809, counts.GSM4039813, counts.GSM4039815, counts.GSM4039819, counts.Gene Expression.GSM4885423, counts.Antibody Capture.GSM4885423, counts.Gene Expression.GSM4885425, counts.Antibody Capture.GSM4885425
In [10]:
VlnPlot(GSE136035, features = c("nFeature_RNA"), pt.size = 0)
VlnPlot(GSE136035, features = c("nCount_RNA"), pt.size = 0)
VlnPlot(GSE136035, features = c("percent.mt"), pt.size = 0)
Warning message: “Default search for "data" layer in "RNA" assay yielded no results; utilizing "counts" layer instead.” Warning message: “Default search for "data" layer in "RNA" assay yielded no results; utilizing "counts" layer instead.”
Warning message: “Default search for "data" layer in "RNA" assay yielded no results; utilizing "counts" layer instead.”
In [11]:
p1 <- FeatureScatter(GSE136035, feature1 = "nCount_RNA", feature2 = "percent.mt", raster = F)
p2 <- FeatureScatter(GSE136035, feature1 = "nCount_RNA", feature2 = "nFeature_RNA", raster = F)
p1
p2
In [12]:
GSE136035 <- subset(GSE136035, subset = nFeature_RNA > 200 & nFeature_RNA < 5000 & percent.mt <20)
GSE136035 <- JoinLayers(GSE136035)
GSE136035
An object of class Seurat 45072 features across 27598 samples within 1 assay Active assay: RNA (45072 features, 0 variable features) 1 layer present: counts
In [13]:
GSE136035[["RNA"]] <- split(GSE136035[["RNA"]], f = GSE136035$orig.ident)
In [14]:
GSE136035
An object of class Seurat 45072 features across 27598 samples within 1 assay Active assay: RNA (45072 features, 0 variable features) 8 layers present: counts.GSM4039805, counts.GSM4039807, counts.GSM4039809, counts.GSM4039813, counts.GSM4039815, counts.GSM4039819, counts.GSM4885423, counts.GSM4885425
In [15]:
GSE136035 <- JoinLayers(GSE136035)
In [16]:
GSE136035 <- NormalizeData(GSE136035, normalization.method = "LogNormalize", scale.factor = 10000)
GSE136035 <- FindVariableFeatures(GSE136035, selection.method = "vst", nfatures = 2000)
GSE136035 <- ScaleData(GSE136035, features = rownames(GSE136035))
GSE136035 <- RunPCA(GSE136035)
Normalizing layer: counts Finding variable features for layer counts Centering and scaling data matrix PC_ 1 Positive: FCN1, SERPINA1, LYZ, S100A9, S100A8, CFD, CEBPD, CSTA, CLEC7A, APOBEC3A CD68, CLEC12A, LGALS2, FPR1, LILRA2, PILRA, AIF1, MS4A6A, LILRB2, LRP1 LST1, S100A12, CSF3R, CD36, MNDA, LILRB3, TNFSF13B, C5AR1, TYMP, CD300E Negative: ATP5MC3, ATP5F1A, ATP5PF, ATP5MF, SELENOK, IL7R, IGHM, PASK, ATP5MC1, BUD23 TENT5C, CD27, ELOC, AL139020.1, CENPX, ARL6IP1, IGLL5, LINC01781, MESD, IGHV5-78 LINC01857, TRAT1, IGHV1-69D, JPT1, GATD3B, LIME1, EAF2, AC078883.1, SRP19, JCHAIN PC_ 2 Positive: LYZ, S100A8, FCN1, S100A9, AIF1, CSTA, SERPINA1, S100A12, FTH1, FPR1 CLEC7A, LST1, LGALS2, CSF3R, LRP1, CLEC12A, MS4A6A, CD36, APOBEC3A, CFD CFP, RNF130, LILRA2, APLP2, C5AR1, TNFSF13B, CD163, PILRA, LILRB3, MPEG1 Negative: TNFRSF17, MZB1, HRASLS2, DERL3, FKBP11, JCHAIN, SDF2L1, PPIB, TYMS, HSP90B1 PDIA4, SSR3, MANF, GPRC5D, MYDGF, CAV1, SUB1, CHPF, BIRC5, ITM2C ABCB9, PDIA6, PRDX4, SEC61B, RRM2, SPCS3, HSPA5, ELL2, BIK, UBE2J1 PC_ 3 Positive: MTRNR2L12, AC090498.1, ATP5E, ATP5L, ATP5D, ATP5B, ATPIF1, ATP5I, ATP5G3, UQCRHL ATP5J2, USMG5, ATP5A1, C14orf2, ATP5F1, MTRNR2L8, ATP5C1, CD74, ATP5G1, SELT ATP5H, HLA-DQA2, RP5-1171I10.5, SELK, KIAA0226L, RP11-731F5.2, RPS4Y1, IGHM, FCRL3, TMEM2 Negative: ATP5MC3, ATP5F1A, ATP5MF, ATP5PF, IL7R, SELENOK, S100A4, SRGN, ATP5MC1, CRIP1 PASK, CD27, S100A10, JPT1, LIME1, RAB5IF, ELOC, MESD, BUD23, NUCB2 AIF1, GZMM, CENPX, CYTOR, ZYX, CFP, FCN1, TRAT1, S100A6, ARL6IP1 PC_ 4 Positive: CD74, IGHM, JCHAIN, EAF2, IGLL5, MZB1, IGKC, CYBB, SPI1, POU2AF1 ITM2C, RHOB, DERL3, UBE2J1, TNFRSF17, IGHV5-78, HERPUD1, IGLC2, PLD4, MPEG1 PPP1R14A, SEMA4A, LILRB4, CPNE5, APLP2, LINC01781, TENT5C, KLHL14, IGHA1, IGHA2 Negative: GNLY, PRF1, GZMA, KLRD1, GZMB, CTSW, CCL5, FGFBP2, GZMH, CLIC3 KLRF1, GZMM, CCL4, IL2RB, ADGRG1, SH2D1B, S1PR5, SAMD3, IGFBP7, S100A4 IL7R, FCGR3A, CD160, PTGDR, TTC38, TBX21, C1orf21, DOK2, ID2, FCRL6 PC_ 5 Positive: GZMB, GNLY, KLRD1, FGFBP2, KLRF1, CD74, PRF1, GZMH, IGFBP7, CLIC3 CCL4, SH2D1B, FCGR3A, GZMA, ADGRG1, S1PR5, TTC38, CD160, IGHM, HOPX CCL5, KLRC1, XCL2, FCRL6, PTGDR, EAF2, C1orf21, IL2RB, JCHAIN, RHOC Negative: IL7R, TRAT1, CISH, NPDC1, CD27, S100A11, S100A4, FYB, LIME1, S100A10 S100A6, PASK, GATA3, SLC40A1, TIMP1, CITED2, ZYX, GAPDH, ATP5H, ITGB1 LDHA, ATP5L, USMG5, PCSK1N, IFI6, PTGER2, TXN, CRIP2, CRIP1, ACTG1
In [17]:
ElbowPlot(GSE136035)
In [18]:
GSE136035 <- FindNeighbors(GSE136035, dims = 1:15, reduction = "pca")
GSE136035 <- FindClusters(GSE136035, resolution = 0.5, cluster.name = "unintegrated_clusters")
Computing nearest neighbor graph Computing SNN
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck Number of nodes: 27598 Number of edges: 894220 Running Louvain algorithm... Maximum modularity in 10 random starts: 0.9305 Number of communities: 20 Elapsed time: 5 seconds
In [19]:
GSE136035 <- RunUMAP(GSE136035, dims = 1:15, reduction = "pca", reduction.name = "umap.unintegrated")
Warning message: “The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation' This message will be shown once per session” 12:17:38 UMAP embedding parameters a = 0.9922 b = 1.112 12:17:38 Read 27598 rows and found 15 numeric columns 12:17:38 Using Annoy for neighbor search, n_neighbors = 30 12:17:38 Building Annoy index with metric = cosine, n_trees = 50 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * | 12:17:41 Writing NN index file to temp file /tmp/RtmpRGOn89/file32c80380fb5a4 12:17:41 Searching Annoy index using 1 thread, search_k = 3000 12:17:51 Annoy recall = 100% 12:17:52 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30 12:17:54 Initializing from normalized Laplacian + noise (using RSpectra) 12:17:55 Commencing optimization for 200 epochs, with 1202276 positive edges 12:17:55 Using rng type: pcg 12:18:13 Optimization finished
In [20]:
GSE136035
Idents(GSE136035) <- GSE136035$orig.ident
DimPlot(GSE136035, reduction = "umap.unintegrated")
An object of class Seurat 45072 features across 27598 samples within 1 assay Active assay: RNA (45072 features, 2000 variable features) 3 layers present: counts, data, scale.data 2 dimensional reductions calculated: pca, umap.unintegrated
In [21]:
#GSE136035 <- IntegrateLayers(
# object = GSE136035, method = HarmonyIntegration,
# orig.reduction = "pca", new.reduction = "harmony",
# verbose = FALSE
#)
In [22]:
library(harmony)
GSE136035 <- RunHarmony(GSE136035,"orig.ident", plot_convergence = T)
Loading required package: Rcpp Transposing data matrix Initializing state using k-means centroids initialization Harmony 1/10 Harmony 2/10 Harmony 3/10 Harmony converged after 3 iterations
In [23]:
harmony_embeddings <- Embeddings(GSE136035, "harmony")
harmony_embeddings[1:5, 1:5]
| harmony_1 | harmony_2 | harmony_3 | harmony_4 | harmony_5 | |
|---|---|---|---|---|---|
| GSM4039805_AAACCTGAGGCATGGT-1 | 1.5389411 | -1.67645223 | 6.9010755 | 0.2210354 | -1.509808278 |
| GSM4039805_AAACCTGCACATGTGT-1 | -0.3147648 | 0.08135463 | 0.6775997 | -3.6687835 | -5.617223173 |
| GSM4039805_AAACCTGCAGTTCATG-1 | 3.0815150 | -2.30729387 | 7.1943725 | -2.3830826 | 1.419970846 |
| GSM4039805_AAACCTGGTATCAGTC-1 | 0.5034509 | 0.12221361 | 6.1371670 | 1.0243512 | 0.106980311 |
| GSM4039805_AAACCTGGTGCCTGGT-1 | 2.5056201 | -1.09620563 | 7.6866434 | 1.6971614 | -0.002098452 |
In [24]:
DimPlot(GSE136035, reduction = "harmony", group.by = "orig.ident",raster = F)
In [25]:
GSE136035 <- RunUMAP(GSE136035, dims = 1:15, reduction = "harmony")
12:18:45 UMAP embedding parameters a = 0.9922 b = 1.112 12:18:45 Read 27598 rows and found 15 numeric columns 12:18:45 Using Annoy for neighbor search, n_neighbors = 30 12:18:45 Building Annoy index with metric = cosine, n_trees = 50 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * | 12:18:49 Writing NN index file to temp file /tmp/RtmpRGOn89/file32c8032d7ce10c 12:18:49 Searching Annoy index using 1 thread, search_k = 3000 12:18:59 Annoy recall = 100% 12:19:00 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30 12:19:01 Initializing from normalized Laplacian + noise (using RSpectra) 12:19:03 Commencing optimization for 200 epochs, with 1212172 positive edges 12:19:03 Using rng type: pcg 12:19:20 Optimization finished
In [26]:
GSE136035 <- FindNeighbors(GSE136035, reduction = "harmony", dims = 1:15)
GSE136035 <- FindClusters(GSE136035, resolution = 0.5, cluster.name = "harmony_clusters")
Computing nearest neighbor graph Computing SNN
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck Number of nodes: 27598 Number of edges: 885726 Running Louvain algorithm... Maximum modularity in 10 random starts: 0.9142 Number of communities: 17 Elapsed time: 5 seconds
In [27]:
DimPlot(GSE136035, raster = F, reduction = "umap")
DimPlot(GSE136035, raster = F, reduction = "umap", group.by = "orig.ident")
pdf(paste0(out_path, "UMAP_harmony.pdf"), width=8, height=6)
DimPlot(GSE136035, raster = F, reduction = "umap")
DimPlot(GSE136035, raster = F, reduction = "umap", group.by = "orig.ident")
dev.off()
pdf: 2
In [28]:
ABC_marker_1 <- c("CD19", "CD80", "CD86", "FAS", "FCRLA", "FCRLB", "FGL2", "CXCL10", "CXCR3", "NKG7", "ITGAM", "ITGB2", "TBX21", "ZBTB32", "FCER2", "CR2")
length(ABC_marker_1)
is.element(ABC_marker_1,rownames(GSE136035))
ABC_marker_1 <- ABC_marker_1[is.element(ABC_marker_1,rownames(GSE136035))]
16
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
In [29]:
FeaturePlot(GSE136035, features = ABC_marker_1[1:4], ncol = 2, reduction = "umap")
In [30]:
fig1_1 <- FeaturePlot(GSE136035, features = ABC_marker_1[1:9], ncol = 3, reduction = "umap")
fig1_2 <- FeaturePlot(GSE136035, features = ABC_marker_1[10:16], ncol = 3, reduction = "umap")
pdf(paste0(out_path, "ABC_marker_feature.pdf"), width=12, height=10)
fig1_1
fig1_2
dev.off()
pdf: 2
In [31]:
source("~/projects/MK_Rcode/setZ.R")
GSE136035[["ABC_score"]] <- setZ(ABC_marker_1,rownames(GSE136035@assays$RNA$scale.data),GSE136035@assays$RNA$scale.data)
In [32]:
LSP1_gene <- c("CEBPA", "CEBPB", "IL1B", "CCL3", "TNF", "CCL4", "CCL5", "SPP1", "CD7", "MSR1", "ITGAM", "KIT", "CD14", "CSF3R", "IL1R2", "CSF1R", "ITGA1", "ITGA5", "ITGB5", "ITGB1", "CAMP", "THBS1", "C3", "MEFV", "FOS", "CYBB", "CARD9", "IRF7", "OAS3", "NLRP3", "CTSB", "IKBKE", "MPO")
length(LSP1_gene)
LSP1_gene_sub <- LSP1_gene[is.element(LSP1_gene,rownames(GSE136035))]
33
In [33]:
LSP1_gene_sub
- 'CEBPA'
- 'CEBPB'
- 'IL1B'
- 'CCL3'
- 'TNF'
- 'CCL4'
- 'CCL5'
- 'SPP1'
- 'CD7'
- 'MSR1'
- 'ITGAM'
- 'KIT'
- 'CD14'
- 'CSF3R'
- 'IL1R2'
- 'CSF1R'
- 'ITGA1'
- 'ITGA5'
- 'ITGB5'
- 'ITGB1'
- 'CAMP'
- 'THBS1'
- 'C3'
- 'MEFV'
- 'FOS'
- 'CYBB'
- 'CARD9'
- 'IRF7'
- 'OAS3'
- 'NLRP3'
- 'CTSB'
- 'IKBKE'
- 'MPO'
In [34]:
fig2_1 <- FeaturePlot(GSE136035, features = LSP1_gene_sub[1:9], ncol = 3, reduction = "umap")
fig2_2 <- FeaturePlot(GSE136035, features = LSP1_gene_sub[10:18], ncol = 3, reduction = "umap")
fig2_3 <- FeaturePlot(GSE136035, features = LSP1_gene_sub[19:27], ncol = 3, reduction = "umap")
fig2_4 <- FeaturePlot(GSE136035, features = LSP1_gene_sub[28:33], ncol = 3, reduction = "umap")
pdf(paste0(out_path, "LSP1_marker_feature.pdf"), width=12, height=10)
fig2_1
fig2_2
fig2_3
fig2_4
dev.off()
pdf: 2
In [35]:
GSE136035[["LSP1_score"]] <- setZ(LSP1_gene_sub,rownames(GSE136035@assays$RNA$scale.data),GSE136035@assays$RNA$scale.data)
In [36]:
head(GSE136035$LSP1_score)
- GSM4039805_AAACCTGAGGCATGGT-1
- -0.122413421523912
- GSM4039805_AAACCTGCACATGTGT-1
- -0.267874338036728
- GSM4039805_AAACCTGCAGTTCATG-1
- -1.44750080367965
- GSM4039805_AAACCTGGTATCAGTC-1
- -1.39620127022624
- GSM4039805_AAACCTGGTGCCTGGT-1
- -0.121238624479262
- GSM4039805_AAACCTGTCTATCGCC-1
- 0.3778473427849
In [37]:
FeaturePlot(GSE136035, features = "LSP1_score", reduction = "umap") +
scale_color_gradient2(low = "white", mid = "white", high = "blue", midpoint = 0)
FeaturePlot(GSE136035, features = "LSP1_score", reduction = "umap")
GSE136035$LSP1_score_pos <- GSE136035$LSP1_score > 0
DimPlot(GSE136035, group.by = "LSP1_score_pos", reduction = "umap")
FeaturePlot(GSE136035, features = "ABC_score") +
scale_color_gradient2(low = "white", mid = "white", high = "blue", midpoint = 0)
FeaturePlot(GSE136035, features = "ABC_score", reduction = "umap")
GSE136035$ABC_score_pos <- GSE136035$ABC_score > 0
DimPlot(GSE136035, group.by = "ABC_score_pos", reduction = "umap")
pdf(paste0(out_path, "score_feature.pdf"), width=8, height=6)
FeaturePlot(GSE136035, features = "LSP1_score", reduction = "umap") +
scale_color_gradient2(low = "white", mid = "white", high = "blue", midpoint = 0)
FeaturePlot(GSE136035, features = "LSP1_score", reduction = "umap")
DimPlot(GSE136035, group.by = "LSP1_score_pos", reduction = "umap")
FeaturePlot(GSE136035, features = "ABC_score", reduction = "umap") +
scale_color_gradient2(low = "white", mid = "white", high = "blue", midpoint = 0)
FeaturePlot(GSE136035, features = "ABC_score", reduction = "umap")
DimPlot(GSE136035, group.by = "ABC_score_pos", reduction = "umap")
dev.off()
Scale for colour is already present. Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present. Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present. Adding another scale for colour, which will replace the existing scale. Scale for colour is already present. Adding another scale for colour, which will replace the existing scale.
pdf: 2
In [38]:
B_marker <- c("CD19", "CD79A", "CD79B", "MS4A1", "CD27", "CD24", "CD38", "IGHD")
T_marker <- c("CD4", "CD8A", "CD69", "CD70", "CD80", "CD40LG", "PDCD1", "HAVCR2")
FeaturePlot(GSE136035, features = B_marker, ncol = 3)
FeaturePlot(GSE136035, features = T_marker, ncol = 3)
pdf(paste0(out_path, "marker_expression.pdf"), width=12, height=12)
FeaturePlot(GSE136035, features = B_marker, ncol = 3, reduction = "umap")
FeaturePlot(GSE136035, features = T_marker, ncol = 3, reduction = "umap")
dev.off()
pdf: 2
In [39]:
LSP1_pos_ind <- GSE136035$LSP1_score_pos==TRUE & GSE136035$ABC_score_pos==FALSE
ABC_pos_ind <- GSE136035$LSP1_score_pos==FALSE & GSE136035$ABC_score_pos==TRUE
DP_ind <- GSE136035$LSP1_score_pos==TRUE & GSE136035$ABC_score_pos==TRUE
DN_ind <- GSE136035$LSP1_score_pos==FALSE & GSE136035$ABC_score_pos==FALSE
sum(LSP1_pos_ind)
sum(ABC_pos_ind)
sum(DP_ind)
sum(DN_ind)
4169
7077
3945
12407
In [40]:
Idents(GSE136035, cells = colnames(GSE136035)[LSP1_pos_ind]) <- "LSP1_positive"
Idents(GSE136035, cells = colnames(GSE136035)[ABC_pos_ind]) <- "ABC_positive"
Idents(GSE136035, cells = colnames(GSE136035)[DP_ind]) <- "Double_positive"
Idents(GSE136035, cells = colnames(GSE136035)[DN_ind]) <- "Double_negative"
DimPlot(GSE136035, reduction = "umap")
pdf(paste0(out_path,"umap_ABC_LSP1.pdf"), width = 8, height = 6)
DimPlot(GSE136035, reduction = "umap")
dev.off()
pdf: 2
In [41]:
if (!file.exists(paste0(out_path,"allmarkers.rds"))) {
markers <- FindAllMarkers(GSE136035, min.pct = 0.1, test.use = 'LR', logfc.threshold = 0.25)
}
if (!exists("markers")) {
markers <- readRDS(paste0(out_path,"allmarkers.rds"))
}
Calculating cluster Double_negative Calculating cluster Double_positive Calculating cluster ABC_positive Calculating cluster LSP1_positive
In [42]:
GSE136035
An object of class Seurat 45072 features across 27598 samples within 1 assay Active assay: RNA (45072 features, 2000 variable features) 3 layers present: counts, data, scale.data 4 dimensional reductions calculated: pca, umap.unintegrated, harmony, umap
In [43]:
markers %>%
group_by(cluster) %>%
filter(avg_log2FC > 0) %>%
filter(p_val_adj < 0.05) %>%
arrange(desc(avg_log2FC)) %>%
slice_head(n = 5) %>%
ungroup() -> top5
# DoHeatmap(pbmc, features = top5$gene) + NoLegend()
DotPlot(GSE136035, features = top5$gene[!duplicated(top5$gene)] ) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
scale_color_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0)
pdf(paste0(out_path, "Dotplot.pdf"), width = 8, height = 6)
DotPlot(GSE136035, features = top5$gene[!duplicated(top5$gene)] ) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
scale_color_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0)
dev.off()
Warning message: “Scaling data with a low number of groups may produce misleading results” Scale for colour is already present. Adding another scale for colour, which will replace the existing scale. Warning message: “Scaling data with a low number of groups may produce misleading results” Scale for colour is already present. Adding another scale for colour, which will replace the existing scale.
pdf: 2
In [44]:
unique(Idents(GSE136035))
- ABC_positive
- LSP1_positive
- Double_positive
- Double_negative
Levels:
- 'Double_negative'
- 'Double_positive'
- 'ABC_positive'
- 'LSP1_positive'
In [45]:
LSP1vsABC.marker <- FindMarkers(GSE136035, min.pct = 0.1, test.use = 'LR', logfc.threshold = 0.1, ident.1 = "LSP1_positive", ident.2 = "ABC_positive")
source("~/projects/MK_Rcode/volcano_scdata.R")
In [46]:
vol1 <- volcano_scdata(LSP1vsABC.marker, "LSP1 vs ABC")
pdf(paste0(out_path,"LSP1_vs_ABC_volcano.pdf"), width = 6, height = 6)
vol1
dev.off()
pdf: 2
In [47]:
library(clusterProfiler)
library(enrichplot)
library(msigdbr)
library(stringr)
set.seed(2025)
clusterProfiler v4.14.6 Learn more at https://yulab-smu.top/contribution-knowledge-mining/
Please cite:
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan,
X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal
enrichment tool for interpreting omics data. The Innovation. 2021,
2(3):100141
Attaching package: ‘clusterProfiler’
The following object is masked from ‘package:stats’:
filter
enrichplot v1.26.6 Learn more at https://yulab-smu.top/contribution-knowledge-mining/
Please cite:
Guangchuang Yu. Using meshes for MeSH term enrichment and semantic
analyses. Bioinformatics. 2018, 34(21):3766-3767,
doi:10.1093/bioinformatics/bty410
For full functionality, please install the 'msigdbdf' package with:
install.packages('msigdbdf', repos = 'https://igordot.r-universe.dev')
In [48]:
hallmark_geneset <- msigdbr(species = "Homo sapiens", category = "H")
hallmark_geneset$gs_name_short <- str_extract(hallmark_geneset$gs_name, "(?<=HALLMARK_).*$")
sort_glist <- LSP1vsABC.marker$avg_log2FC
names(sort_glist) <- rownames(LSP1vsABC.marker)
sort_glist <- sort(sort_glist, decreasing = T)
gsea_result.LSP1vsABS <- GSEA(geneList = sort_glist, TERM2GENE = select(hallmark_geneset, gs_name_short, gene_symbol), seed = T)
dotplot(gsea_result.LSP1vsABS, showCategory=10, split=".sign") + facet_grid(.~.sign)
gsea_result.LSP1vsABS <- GSEA(geneList = sort_glist, TERM2GENE = select(hallmark_geneset, gs_name_short, gene_symbol), seed = T, pvalueCutoff = 0.99)
if (!file.exists(paste0(out_path,"gsea_GSE136035_050725.rds"))) {
saveRDS(gsea_result.LSP1vsABS, paste0(out_path,"gsea_GSE136035_050725.rds"))
}
Warning message:
“The `category` argument of `msigdbr()` is deprecated as of msigdbr 10.0.0.
ℹ Please use the `collection` argument instead.”
The 'msigdbdf' package must be installed to access the full dataset.
Please run the following command to install the 'msigdbdf' package:
install.packages('msigdbdf', repos = 'https://igordot.r-universe.dev')
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
In [49]:
pdf(paste0(out_path, "GSEA_LSP1vsABS.pdf"), width = 8, height = 6)
dotplot(gsea_result.LSP1vsABS, showCategory=10, split=".sign") + facet_grid(.~.sign)
dev.off()
pdf: 2
In [50]:
if (!file.exists(paste0(out_path,"GSE136035_metadata.rds"))) {
saveRDS(GSE136035@meta.data, file = paste0(out_path,"GSE136035_metadata.rds"))
}
In [55]:
if (!file.exists(paste0(out_path,"GSE136035_Sobj_050825.rds"))) {
saveRDS(GSE136035, file = paste0(out_path,"GSE136035_Sobj_050825.rds"))
}
In [53]:
out_path
'~/projects/2024/RA/GSE136035_RAW/output/'